{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Querying Tutorial\n",
    "\n",
    "When querying, you can specify several arguments using `fast_graphrag.QueryParam`:\n",
    "- `only_context`: skips the answer generation step and simply returns the context retrieved from the graph that would have been passed to the LLM. This includes the nodes, edges and chunks selected, each associated with its relevance scored, as computed by page rank.\n",
    "- `with_references`: it is possible to ask the LLM to add inline references to the chunks (and thus documents) used to generate an answer.\n",
    "\n",
    "Furthermore it is possible to specify how many tokens to use for the context. The default values are:\n",
    "- `entities_max_tokens = 4000`: number of context tokens reserved for the retrieved nodes (i.e., entities) and their descriptions.\n",
    "- `relationships_max_tokens = 3000`: number of context tokens reserved for the retrieved edges (i.e., relations between entities) and their descriptions.\n",
    "- `chunks_max_tokens = 9000`: number of context tokens reserved for the retrieved chunks."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Querying with `only_context`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from fast_graphrag import GraphRAG, QueryParam\n",
    "\n",
    "working_dir = \"...\"\n",
    "DOMAIN = \"...\"\n",
    "QUERIES = [\"...\", \"...\"]\n",
    "ENTITY_TYPES = [\"...\", \"...\"]\n",
    "\n",
    "grag = GraphRAG(\n",
    "    working_dir=working_dir,\n",
    "    domain=DOMAIN,\n",
    "    example_queries=\"\\n\".join(QUERIES),\n",
    "    entity_types=ENTITY_TYPES,\n",
    ")\n",
    "\n",
    "# Inserting some data\n",
    "corpus = {\n",
    "    \"title1\": \"corpus1 ...\",\n",
    "    \"title2\": \"corpus2 ...\",\n",
    "}\n",
    "grag.insert(\n",
    "    [f\"{title}: {corpus}\" for title, corpus in tuple(corpus.items())],\n",
    "    metadata=[{\"id\": title} for title in tuple(corpus.keys())],\n",
    ")\n",
    "\n",
    "# Querying only context\n",
    "query = \"...\"\n",
    "answer = grag.query(query, QueryParam(only_context=True))\n",
    "\n",
    "print(answer.response)  # \"\"\n",
    "print(answer.context)  # {entities: [...], relations: [...], chunks: [...]}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Querying with `with_references`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from fast_graphrag import GraphRAG, QueryParam\n",
    "\n",
    "working_dir = \"...\"\n",
    "DOMAIN = \"...\"\n",
    "QUERIES = [\"...\", \"...\"]\n",
    "ENTITY_TYPES = [\"...\", \"...\"]\n",
    "\n",
    "grag = GraphRAG(\n",
    "    working_dir=working_dir,\n",
    "    domain=DOMAIN,\n",
    "    example_queries=\"\\n\".join(QUERIES),\n",
    "    entity_types=ENTITY_TYPES,\n",
    ")\n",
    "\n",
    "# Inserting some data\n",
    "corpus = {\n",
    "    \"title1\": \"corpus1 ...\",\n",
    "    \"title2\": \"corpus2 ...\",\n",
    "}\n",
    "grag.insert(\n",
    "    [f\"{title}: {corpus}\" for title, corpus in tuple(corpus.items())],\n",
    "    metadata=[{\"id\": title} for title in tuple(corpus.keys())],\n",
    ")\n",
    "\n",
    "# Querying only context\n",
    "query = \"...\"\n",
    "answer = grag.query(query, QueryParam(with_references=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`answer` should now contain inline references in the form of `[<source_id>]`, where `source_id - 1` is the index of the corresponding chunk in `answer.context.chunks`. Each chunk comes with its own metadata, so you already extract all the information you want by parsing `answer.response`.\n",
    "However, we provide a method that does that for you (and hopefully covers 99% of the use cases): `answer.format_references`.\n",
    "`answer.format_references` takes a `format_fn` which allows you to specify what to write for each reference, in a tidy and easy way. Let's go through a couple of examples.\n",
    "NOTE: if your use case is not covered by this, please feel free to create a feature request (or even a pull request) on GitHub, we'll be happy to have a look!\n",
    "\n",
    "Let's consider the following query, answer and associated context (this is taken from the 2wikimultihopqa dataset):\n",
    "\n",
    "```json\n",
    "\"query\": \"What is the place of birth of the performer of song Changed It?\",\n",
    "\"answer\": \"The performer of the song \\\"Changed It\\\" is Nicki Minaj, who was born in Saint James, Port of Spain, Trinidad and Tobago, and was raised in Queens, New York City [1][4].\",\n",
    "\"chunks\": [\n",
    "    [\n",
    "        {\n",
    "            \"meta\": {\n",
    "                \"id\": \"Changed It\"\n",
    "            },\n",
    "            \"content\": \"Changed It: \\\"Changed It\\\" is a song by Trinidadian-American rapper and singer Nicki Minaj and American rapper Lil Wayne...\"\n",
    "        },\n",
    "        2.9243156630545855\n",
    "    ],\n",
    "    [\n",
    "        {\n",
    "            \"meta\": {\n",
    "                \"id\": \"Frank Sinatra\"\n",
    "            },\n",
    "            \"content\": \"Frank Sinatra: Francis Albert Sinatra  ( December 12, 1915 \\u2013 May 14, 1998) was an American singer...\"\n",
    "        },\n",
    "        0.3308500498533249\n",
    "    ],\n",
    "    [\n",
    "        {\n",
    "            \"meta\": {\n",
    "                \"id\": \"Andrew Allen (singer)\"\n",
    "            },\n",
    "            \"content\": \"Andrew Allen (singer): Andrew Allen( born 6 May 1981) is a Canadian- born singer- songwriter from Vernon, British Columbia...\"\n",
    "        },\n",
    "        0.17779897525906563\n",
    "    ],\n",
    "    [\n",
    "        {\n",
    "            \"meta\": {\n",
    "                \"id\": \"Nicki Minaj\"\n",
    "            },\n",
    "            \"content\": \"Nicki Minaj: Onika Tanya Maraj-Petty (born December 8, 1982), known professionally as Nicki Minaj...\"\n",
    "        },\n",
    "        1.392689185217023\n",
    "    ],\n",
    "    [\n",
    "        {\n",
    "            \"meta\": {\n",
    "                \"id\": \"Revolution (Jars of Clay song)\"\n",
    "            },\n",
    "            \"content\": \"Revolution (Jars of Clay song): \\\" Revolution\\\" is a song written and performed by Jars of Clay...\"\n",
    "        },\n",
    "        0.13545062253251672\n",
    "    ]\n",
    "]\n",
    "```\n",
    "Please note a couple of things about the example above:\n",
    "- Only retrieved chunks are shown (no entities or relationships even if those are provided to the LLM when generating the response);\n",
    "- Chunk content is truncated to make it easy to visualize the list;\n",
    "- Chunk \"Nicki Minaj\" has been manually moved from position #2 to position #4, and the corresponding reference in the answer has been changed from \"[2]\" to \"[4]\", to enphasize that `format_references` will automatically order the references for you;\n",
    "- as mentioned early, reference indices start from `1`.\n",
    "\n",
    "Let's now use `format_references`. `format_references` takes three parameters:\n",
    "- `document_index`: a automatically generated increasing index associated to each referenced document;\n",
    "- `chunk_indices`: a list of chunk indices associated with the document;\n",
    "- `metadata`: the metadata dictionary associated to each document.\n",
    "`document`s are created by aggregating all the chunks with the same metadata dictionary. In our case, each chunk is its own document since they all have different metadatas."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Example #1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fresponse, fcontext = answer.format_references(\n",
    "    lambda doc_index, chunk_indices, metadata: f\"[{doc_index}.{'-'.join(map(str, chunk_indices))}]({metadata['id']})\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This output the following:\n",
    "\n",
    "```json\n",
    "{\n",
    "    \"fresponse\": \"The performer of the song \\\"Changed It\\\" is Nicki Minaj, who was born in Saint James, Port of Spain, Trinidad and Tobago, and was raised in Queens, New York City [1.1](Changed It)[2.1](Nicki Minaj).\",\n",
    "    \"fcontext\": {\n",
    "            \"1\": {\n",
    "                \"meta\": {\n",
    "                    \"id\": \"Changed It\"\n",
    "                },\n",
    "                \"chunks\": {\n",
    "                    \"1\": [\"Changed It: \\\"Changed It\\\" is a song by Trinidadian-American...\", 3258430810805445470]\n",
    "                }\n",
    "            },\n",
    "            \"2\": {\n",
    "                \"meta\": {\n",
    "                    \"id\": \"Nicki Minaj\"\n",
    "                },\n",
    "                \"chunks\": {\n",
    "                    \"1\": [\"Nicki Minaj: Onika Tanya Maraj-Petty (born December 8, 1982)...\", 18375001403800070367]\n",
    "                }\n",
    "            }\n",
    "        }\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If, instead, both \"Changed It\" and \"Nicki Minaj\" were from the same document with metadata `{\"url\": \"wikipedia.com/changed-it-nicki-minaj\"}` we could get:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fresponse, fcontext = answer.format_references(\n",
    "    lambda doc_index, chunk_indices, metadata: f\"[{doc_index}]({metadata['url']})\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```json\n",
    "{\n",
    "    \"fresponse\": \"The performer of the song \\\"Changed It\\\" is Nicki Minaj, who was born in Saint James, Port of Spain, Trinidad and Tobago, and was raised in Queens, New York City [1](wikipedia.com/changed-it-nicki-minaj).\",\n",
    "    \"fcontext\": {\n",
    "            \"1\": {\n",
    "                \"meta\": {\n",
    "                    \"url\": \"wikipedia.com/changed-it-nicki-minaj\"\n",
    "                },\n",
    "                \"chunks\": {\n",
    "                    \"1\": [\"Changed It: \\\"Changed It\\\" is a song by Trinidadian-American...\", 3258430810805445470],\n",
    "                    \"2\": [\"Nicki Minaj: Onika Tanya Maraj-Petty (born December 8, 1982)...\", 18375001403800070367]\n",
    "                }\n",
    "            }\n",
    "        }\n",
    "}\n",
    "```\n",
    "\n",
    "Note that the formatting function has been arbitrarely changed to only output the document index and ignore the chunk_indices."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "cm",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
